Skip to content

Langchain Vector Search Tool uses MCP under the hood#295

Open
nisha2003 wants to merge 11 commits intomcp-migrationfrom
langchain-vs-uses-mcp-adapters
Open

Langchain Vector Search Tool uses MCP under the hood#295
nisha2003 wants to merge 11 commits intomcp-migrationfrom
langchain-vs-uses-mcp-adapters

Conversation

@nisha2003
Copy link
Contributor

@nisha2003 nisha2003 commented Jan 27, 2026

Migrate Langchain Vector Search Tool to use MCP adapters. We still preserve the direct API path for self-managed embeddings (for which there is no MCP support).

Refactored some duplicate code for the MCP path between OpenAI and Langchain to the base mixin class. Moved tests to the mixin class for shared functionality.

Manual tests (https://eng-ml-inference.staging.cloud.databricks.com/editor/notebooks/1465545330011655?o=1653573648247579) in the Langchain Migration section

@nisha2003 nisha2003 changed the title changes Langchain Vector Search Tool uses MCP under the hood Jan 27, 2026
Copy link
Contributor

@aravind-segu aravind-segu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couple comments but looks good overall. Will stamp after E2E testing

) -> Dict[str, Any]:
"""Build input for MCP tool invocation."""
mcp_input = self._build_mcp_params(filters, **kwargs)
mcp_input["query"] = query
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: why special case this? Can we pass it into _build_mcp_params

query: str,
filters: Optional[Union[Dict[str, Any], List[FilterItem]]] = None,
**kwargs,
) -> List[Document]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was just curious here, before _run returns a str and now it returns a List[Document]. Did we change something or was the previous return type wrong

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it maybe have been wrong? Previously we just returned self._vector_store.similarity_search(**kwargs) for which the return type is List[Document].

return filters
return {item.model_dump()["key"]: item.model_dump()["value"] for item in filters}

def _build_mcp_meta(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use _build_mcp_params directly?

"""Build metadata dict for MCP tool invocation."""
return self._build_mcp_params(filters, **kwargs)

def _parse_mcp_response(self, mcp_response: str) -> List[Dict]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: same thing here, use the _parse_mcp_response_to_dicts directly?

self,
query: str,
filters: Optional[Union[Dict[str, Any], List[FilterItem]]] = None,
openai_client: OpenAI = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the DatabricksOpenAI here to automatically authenticate with WorkspaceClient

def _validate_mcp_tools(self, tools: list) -> None:
"""Validate that MCP tools were returned."""
if not tools:
raise ValueError(f"No MCP tools found for index {self.index_name}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add the other validation that the tools are of only length 1. We do that in open ai rn

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep -- added.

@bbqiu bbqiu removed their request for review January 28, 2026 06:56
@nisha2003 nisha2003 requested review from aravind-segu and removed request for annzhang-db and sunishsheth2009 January 29, 2026 00:53
return [{"page_content": mcp_response, "metadata": {}}]

if not isinstance(parsed, list):
if strict:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: It looks like strict is always True, do we need this parameter?

results = []
for item in parsed:
if isinstance(item, dict):
page_content = item.get(text_col, str(item))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here if text_col is not in the item we are stringifying it. Maybe we should throw a good error here instead? Are there cases where the text column wont exist?

params: Dict[str, Any] = {}

if query is not None:
params["query"] = query
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

query is required in MCP Params. Lets not accept a none query

if query is not None:
params["query"] = query

num_results = kwargs.pop("num_results", kwargs.pop("k", self.num_results))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait I dont think we support num_results and other args in our VectorSearchMCP right now. Looking at the code, we only return a parameter of query?

Image

https://github.com/databricks-eng/universe/blob/a89d0d891d7eb2ca1a282634ebbd0d853e5dddf1/langchain/langchain-core/src/handlers/VectorSearchMcpHandler.scala#L128-L135

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants